{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LeNet\n",
    "\n",
    "卷积神经网络就是含卷积层的网络。本节里我们将介绍一个早期用来识别手写数字图像的卷积神经网络：LeNet [1]。这个名字来源于LeNet论文的第一作者Yann LeCun。LeNet展示了通过梯度下降训练卷积神经网络可以达到手写数字识别在当时最先进的结果。这个奠基性的工作第一次将卷积神经网络推上舞台，为世人所知。LeNet的网络结构如下图所示。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LeNet模型\n",
    "\n",
    "LeNet分为卷积层块和全连接层块两个部分。下面我们分别介绍这两个模块。\n",
    "\n",
    "卷积层块里的基本单位是卷积层后接最大池化层：卷积层用来识别图像里的空间模式，如线条和物体局部，之后的最大池化层则用来降低卷积层对位置的敏感性。卷积层块由两个这样的基本单位重复堆叠构成。在卷积层块中，每个卷积层都使用$5\\times 5$的窗口，并在输出上使用sigmoid激活函数。第一个卷积层输出通道数为6，第二个卷积层输出通道数则增加到16。这是因为第二个卷积层比第一个卷积层的输入的高和宽要小，所以增加输出通道使两个卷积层的参数尺寸类似。卷积层块的两个最大池化层的窗口形状均为$2\\times 2$，且步幅为2。由于池化窗口与步幅形状相同，池化窗口在输入上每次滑动所覆盖的区域互不重叠。\n",
    "\n",
    "卷积层块的输出形状为(批量大小, 通道, 高, 宽)。当卷积层块的输出传入全连接层块时，全连接层块会将小批量中每个样本变平（flatten）。也就是说，全连接层的输入形状将变成二维，其中第一维是小批量中的样本，第二维是每个样本变平后的向量表示，且向量长度为通道、高和宽的乘积。全连接层块含3个全连接层。它们的输出个数分别是120、84和10，其中10为输出的类别个数。\n",
    "\n",
    "下面我们通过`Sequential`类来实现LeNet模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "from torch import optim\n",
    "\n",
    "import sys\n",
    "sys.path.append(\"..\") \n",
    "torch.backends.cudnn.benchmark = True\n",
    "torch.backends.cudnn.deterministic = False\n",
    "torch.backends.cudnn.enabled = True\n",
    "\n",
    "\n",
    "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
    "\n",
    "class LeNet(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(LeNet, self).__init__()\n",
    "        self.Sequential_cov = nn.Sequential(\n",
    "            nn.Conv2d(1, 6, 5) ,# in_channels, out_channels, kernel_size\n",
    "            nn.Sigmoid(),\n",
    "            nn.MaxPool2d(2 ,2 ) ,#  # kernel_size, stride\n",
    "            nn.Conv2d(6, 16, 5),\n",
    "            nn.Sigmoid(),\n",
    "            nn.MaxPool2d(2 ,2 )) #  # kernel_size, stride\n",
    "        self.Sequential_linear = nn.Sequential(\n",
    "            nn.Linear(16*4*4, 120),\n",
    "            nn.Sigmoid(),\n",
    "            nn.Linear(120,84),\n",
    "            nn.Sigmoid(),\n",
    "            nn.Linear(84,10)\n",
    "        )\n",
    "    \n",
    "    def forward(self, x):\n",
    "        x = self.Sequential_cov(x)\n",
    "        return self.Sequential_linear(x.view(x.shape[0], -1))\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 0.4499,  0.0426,  0.4224, -0.1662,  0.3933, -0.0846, -0.0712, -0.3690,\n",
      "          0.1842,  0.0105],\n",
      "        [ 0.4502,  0.0422,  0.4224, -0.1661,  0.3931, -0.0846, -0.0711, -0.3688,\n",
      "          0.1843,  0.0107],\n",
      "        [ 0.4501,  0.0423,  0.4225, -0.1659,  0.3931, -0.0846, -0.0711, -0.3687,\n",
      "          0.1842,  0.0107],\n",
      "        [ 0.4500,  0.0424,  0.4225, -0.1662,  0.3933, -0.0846, -0.0711, -0.3689,\n",
      "          0.1844,  0.0105],\n",
      "        [ 0.4500,  0.0425,  0.4227, -0.1657,  0.3932, -0.0848, -0.0711, -0.3688,\n",
      "          0.1843,  0.0105]], grad_fn=<AddmmBackward>)\n"
     ]
    }
   ],
   "source": [
    "input_test = torch.randn(5,1,28,28)\n",
    "net = LeNet()\n",
    "out = net(input_test)\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LeNet(\n",
      "  (Sequential_cov): Sequential(\n",
      "    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))\n",
      "    (1): Sigmoid()\n",
      "    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
      "    (4): Sigmoid()\n",
      "    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (Sequential_linear): Sequential(\n",
      "    (0): Linear(in_features=256, out_features=120, bias=True)\n",
      "    (1): Sigmoid()\n",
      "    (2): Linear(in_features=120, out_features=84, bias=True)\n",
      "    (3): Sigmoid()\n",
      "    (4): Linear(in_features=84, out_features=10, bias=True)\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(net)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "batch_size = 256\n",
    "import dl_utils\n",
    "train_iter, test_iter = dl_utils.load_data_fashion_mnist(batch_size=batch_size)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为卷积神经网络计算比多层感知机要复杂，建议使用GPU来加速计算。因此，我们对3.6节（softmax回归的从零开始实现）中描述的`evaluate_accuracy`函数略作修改，使其支持GPU计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def evaluate_accuracy(data_iter, net, \n",
    "                      device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')):\n",
    "    acc_sum, n = 0.0, 0\n",
    "    with torch.no_grad():\n",
    "        for X, y in data_iter:\n",
    "            net.eval() # 评估模式, 这会关闭dropout\n",
    "            acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()\n",
    "            net.train() # 改回训练模式\n",
    "            n += y.shape[0]\n",
    "    return acc_sum / n\n",
    "\n",
    "\n",
    "def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):\n",
    "    net = net.to(device)\n",
    "    print(\"training on \", device)\n",
    "    loss = torch.nn.CrossEntropyLoss()\n",
    "    batch_count = 0\n",
    "    for epoch in range(num_epochs):\n",
    "        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()\n",
    "        for X, y in train_iter:\n",
    "            X = X.to(device)\n",
    "            y = y.to(device)\n",
    "            y_hat = net(X)\n",
    "            l = loss(y_hat, y)\n",
    "            optimizer.zero_grad()\n",
    "            l.backward()\n",
    "            optimizer.step()\n",
    "            train_l_sum += l.cpu().item()\n",
    "            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()\n",
    "            n += y.shape[0]\n",
    "            batch_count += 1\n",
    "        test_acc = evaluate_accuracy(test_iter, net)\n",
    "        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'\n",
    "              % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "training on  cuda\n",
      "epoch 1, loss 1.9078, train acc 0.295, test acc 0.579, time 5.7 sec\n",
      "epoch 2, loss 0.4722, train acc 0.644, test acc 0.694, time 4.7 sec\n",
      "epoch 3, loss 0.2513, train acc 0.720, test acc 0.737, time 4.4 sec\n",
      "epoch 4, loss 0.1638, train acc 0.749, test acc 0.753, time 4.4 sec\n",
      "epoch 5, loss 0.1202, train acc 0.765, test acc 0.762, time 4.3 sec\n"
     ]
    }
   ],
   "source": [
    "lr, num_epochs = 0.001, 5\n",
    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
    "train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "不用cudnn \n",
    "training on  cuda\n",
    "epoch 1, loss 1.8613, train acc 0.319, test acc 0.580, time 9.6 sec\n",
    "epoch 2, loss 0.4780, train acc 0.628, test acc 0.687, time 4.1 sec\n",
    "epoch 3, loss 0.2591, train acc 0.713, test acc 0.729, time 4.0 sec\n",
    "epoch 4, loss 0.1712, train acc 0.740, test acc 0.746, time 4.0 sec\n",
    "epoch 5, loss 0.1261, train acc 0.756, test acc 0.759, time 4.0 sec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "- 卷积神经网络就是含卷积层的网络。\n",
    "- LeNet交替使用卷积层和最大池化层后接全连接层来进行图像分类。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
